HES 505 Fall 2024: Session 21
By the end of today you should be able to:
Describe and implement overlay analyses
Extend overlay analysis to statistical modeling
Generate spatial predictions from statistical models
Methods for identifying optimal site selection or suitability
Apply a common scale to diverse or dissimilar outputs
Define the problem.
Break the problem into submodels.
Determine significant layers.
Reclassify or transform the data within a layer.
Add or combine the layers.
Verify
Successive disqualification of areas
Series of “yes/no” questions
“Sieve” mapping
Reclassifying
Which types of land are appropriate
Assume relationships are really Boolean
No measurement error
Categorical measurements are known exactly
Boundaries are well-represented
\[ \begin{equation} F(\mathbf{s}) = \prod_{M=1}^{m}X_m(\mathbf{s}) \end{equation} \]
\[ \begin{equation} F(\mathbf{s}) = f(w_1X_1(\mathbf{s}), w_2X_2(\mathbf{s}), w_3X_3(\mathbf{s}), ..., w_mX_m(\mathbf{s})) \end{equation} \]
\(F(\mathbf{s})\) does not have to be binary (could be ordinal or continuous)
\(X_m(\mathbf{s})\) could also be extended beyond simply ‘suitable/not suitable’
Adding weights allows incorporation of relative importance
Other functions for combining inputs (\(X_m(\mathbf{s})\))
\[ \begin{equation} F(\mathbf{s}) = \frac{\sum_{i=1}^{m}w_iX_i(\mathbf{s})}{\sum_{i=1}^{m}w_i} \end{equation} \]
\(F(s)\) is now an index based on the values of \(X_m(\mathbf{s})\)
\(w_i\) can incorporate weights of evidence, uncertainty, or different participant preferences
Dividing by \(\sum_{i=1}^{m}w_i\) normalizes by the sum of weights
\[ \begin{equation} F(\mathbf{s}) = w_0 + \sum_{i=1}^{m}w_iX_i(\mathbf{s}) + \epsilon \end{equation} \]
If we estimate \(w_i\) using data, we specify \(F(s)\) as the outcome of regression
When \(F(s)\) is binary → logistic regression
When \(F(s)\) is continuous → linear (gamma) regression
When \(F(s)\) is discrete → Poisson regression
Assumptions about \(\epsilon\) matter!!
To identify important correlations between predictors and the occurrence of an event
Generate maps of the ‘range’ or ‘niche’ of events
Understand spatial patterns of event co-occurrence
Forecast changes in event distributions
From Long
Spatially referenced locations of events \((\mathbf{y})\) sampled from the study extent
A matrix of predictors \((\mathbf{X})\) that can be assigned to each event based on spatial location
Goal: Estimate the probability of occurrence of events across unsampled regions of the study area based on correlations with predictors
Random or systematic sample of the study region
The presence (or absence) of the event is recorded for each point
Hypothesized predictors of occurrence are measured (or extracted) at each point
We can model favorability as the probability of occurrence using a logistic regression
A link function maps the linear predictor \((\mathbf{x_i}'\beta + \alpha)\) onto the support (0-1) for probabilities
Estimates of \(\beta\) can then be used to generate ‘wall-to-wall’ spatial predictions
\[ \begin{equation} y_{i} \sim \text{Bern}(p_i)\\ \text{link}(p_i) = \mathbf{x_i}'\beta + \alpha \end{equation} \]
Inputs from the dismo package
The sample data
Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -74.94368 ymin: -14.5071 xmax: -71.98333 ymax: -12.85
Geodetic CRS: WGS 84
y geometry
804 0 POINT (-74.94368 -13.30268)
419 1 POINT (-74.85 -12.85)
683 0 POINT (-74.06374 -13.52932)
563 1 POINT (-71.98333 -13.51667)
767 0 POINT (-72.06618 -14.5071)
267 1 POINT (-73.2758 -13.8255)
Building our dataframe
ID MeanAnnTemp TotalPrecip PrecipWetMonth PrecipDryMonth MinTempCold
1 1 4.975000 760 151 3 -5.9
2 2 6.391667 830 146 10 -4.1
3 3 11.816667 845 181 7 1.3
4 4 11.241667 694 150 4 -0.4
5 5 6.875000 909 199 4 -6.4
6 6 5.750000 1002 190 12 -5.4
TempRange
1 6.3
2 5.7
3 4.6
4 6.6
5 8.3
6 6.4
Building our dataframe
ID MeanAnnTemp TotalPrecip PrecipWetMonth
Min. : 1.00 Min. :-1.5372 Min. :-2.40213 Min. :-3.1103
1st Qu.:19.25 1st Qu.:-0.6294 1st Qu.:-0.51638 1st Qu.:-0.4595
Median :37.50 Median :-0.2416 Median : 0.02767 Median : 0.1910
Mean :37.50 Mean : 0.0000 Mean : 0.00000 Mean : 0.0000
3rd Qu.:55.75 3rd Qu.: 0.5984 3rd Qu.: 0.41766 3rd Qu.: 0.5697
Max. :74.00 Max. : 3.5301 Max. : 5.24357 Max. : 3.2982
PrecipDryMonth MinTempCold TempRange
Min. :-0.8282 Min. :-1.4490 Min. :-2.38574
1st Qu.:-0.6660 1st Qu.:-0.7159 1st Qu.:-0.69892
Median :-0.2607 Median :-0.2996 Median :-0.09947
Mean : 0.0000 Mean : 0.0000 Mean : 0.00000
3rd Qu.: 0.4283 3rd Qu.: 0.7033 3rd Qu.: 0.80668
Max. : 5.0085 Max. : 3.5710 Max. : 1.90799
Looking at correlations
Looking at correlations
Fitting some models
pts.df <- cbind(pts.df, pres.abs$y)
colnames(pts.df)[8] <- "y"
logistic.global <- glm(y~., family=binomial(link="logit"), data=pts.df[,2:8])
logistic.simple <- glm(y ~ MeanAnnTemp + TotalPrecip, family=binomial(link="logit"), data=pts.df[,2:8])
logistic.rich <- glm(y ~ MeanAnnTemp + PrecipWetMonth + PrecipDryMonth, family=binomial(link="logit"), data=pts.df[,2:8])Checking out the results
Call:
glm(formula = y ~ ., family = binomial(link = "logit"), data = pts.df[,
2:8])
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.08264 0.25052 -0.330 0.741
MeanAnnTemp -0.62712 3.36175 -0.187 0.852
TotalPrecip -1.41121 0.86351 -1.634 0.102
PrecipWetMonth 0.71202 0.50360 1.414 0.157
PrecipDryMonth 1.06540 0.80825 1.318 0.187
MinTempCold 2.02792 4.88923 0.415 0.678
TempRange 1.84210 1.98680 0.927 0.354
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 102.532 on 73 degrees of freedom
Residual deviance: 92.993 on 67 degrees of freedom
AIC: 106.99
Number of Fisher Scoring iterations: 4
Checking out the results
Call:
glm(formula = y ~ MeanAnnTemp + TotalPrecip, family = binomial(link = "logit"),
data = pts.df[, 2:8])
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.05085 0.23676 -0.215 0.830
MeanAnnTemp 0.35247 0.24825 1.420 0.156
TotalPrecip -0.18792 0.24423 -0.769 0.442
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 102.532 on 73 degrees of freedom
Residual deviance: 99.953 on 71 degrees of freedom
AIC: 105.95
Number of Fisher Scoring iterations: 4
Checking out the results
Call:
glm(formula = y ~ MeanAnnTemp + PrecipWetMonth + PrecipDryMonth,
family = binomial(link = "logit"), data = pts.df[, 2:8])
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.05522 0.23813 -0.232 0.8166
MeanAnnTemp 0.51136 0.28979 1.765 0.0776 .
PrecipWetMonth 0.15989 0.27013 0.592 0.5539
PrecipDryMonth -0.33762 0.28827 -1.171 0.2415
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 102.532 on 73 degrees of freedom
Residual deviance: 99.104 on 70 degrees of freedom
AIC: 107.1
Number of Fisher Scoring iterations: 4
Comparing models
Generating predictions
Generating predictions
Generating predictions
Generating predictions
Dependent variable must be binary
Observations must be independent (important for spatial analyses)
Predictors should not be collinear
Predictors should be linearly related to the log-odds
Sample Size